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Let X = C + E with a deterministic matrix C £ K M x M and E some 
centered Gaussian M X M-matrix whose entries are independent with vari- 
ance a 2 . In the present work, the accuracy of reduced-rank projections of 
X is studied. Non-asymptotic universal upper and lower bounds are de- 

r***" ' rived, and favorable and unfavorable prototypes of matrices C in terms of 

the accuracy of approximation arc characterized. The approach does not 
involve analytic perturbation theory of linear operators and allows for mul- 

P^ , tiplicities in the singular value spectrum. Our main result is some general 

P^ ■ non-asymptotic upper bound on the accuracy of approximation which in- 

volves explicitly the singular values of C, and which is shown to be sharp in 

i-Q ' various regimes of C. The results are accompanied by lower bounds under 

diverse assumptions. Consequences on statistical estimation problems, in 
particular in the recent area of low-rank matrix recovery, are discussed. 



1. Introduction. As a consequence of the Bai and Yin (1993) law, the maxi- 
mal singular value A max (E) of an iid standard Gaussian M x M -matrix E is equal 
qq \ to 2vM(l + o(l)) a.s. Since in addition the sequence X max (E)/y/M is uniformly 

■^J- ■ integrable (Johnson and Lindcnstrauss (2001), Chapter 8, Theorem 2.4), the corre- 

IO \ sponding bound holds in expectation as well. Similarly, EA max (E) 2 = 4M(l + o(l)). 

r^. ■ Let || • \\s 2 denote the Hilbert-Schmidt or Frobenius norm. Define tti to be the 

orthogonal projection matrix onto the one-dimensional subspace of R maxi- 
mizing ||7TiE||| over all one-dimensional orthogonal projections tt x . Rewriting 
A max (E) 2 = ||7riE||| 2 yields 

(1.1) E||^E|| 2 2 = 4M(l + o(l)). 

In contrast, E||7TiE||| 2 = M for every fixed tt\. Thus, replacing one single projection 
by the supremum over all projections increases the Hilbert-Schmidt norm by a 
positive factor: 

(1.2) E||^E|| 2 S2 - E||7riE||| a = 3(l + o(l))Af. 

This effect raises the question about the accuracy for empirical reduced-rank pro- 
jections in general. Consider the model 

(1.3) X = C + E 

with a deterministic matrix C £ l" xM and E some centered Gaussian M x M- 
matrix whose entries are independent with variance a 2 . Here and subsequently, 

1 



A. ROHDE 



let 



(1.4) TT r := Argmax ||7r r X||a and ir r e ArgmaxE|J7r r X|| S2 

with 5M, r denoting the set of all M x M-matrices representing orthogonal pro- 
jections onto r-dimensional linear subspaces of K M . How close is E|J7r r AT||| to its 
deterministic counterpart E|J7r r X|j | = ||7r r C||| + a 2 rM if the Gaussian matrix X 
is not centered? For every fixed MfN and a 2 > 0, the following questions are 
natural: 



(A) Does there exist some favorable matrix C = EX for which the accuracy of 

|| 2 -E||7r r X||| 2 



approximation E|J7r r X||| — E 1 1 7r r X 1 1| improves over the situation described 



in (1.2)? 

(B) Does there exist for any arbitrarily large real number c some unfavorable 
matrix C(c) such that E||7r r X||| — E||7r r X||| > c? 

Based on the random variable X = C + E within model (1.3), denote the difference 

by 

(1-5) 5c,M,^, r :=E||7r r X||| 2 -E||7r r X||| 2) 

which, in terms of singular values, is equal to 



^E(\ 2 -A2-a 2 A/ 



where Ai > A2 > ... > Am and Ai > A2 > ... > Am denote the singular values 
of X and C, respectively. The goal in the present article is to study this quantity 
6c\M,o- 2 ,r, to derive universal upper and lower bounds, and to characterize favorable 
and unfavorable types of matrices C in terms of the accuracy of approximation 

3c,M,cr 2 ,r- 

The motivation for considering this problem is two-fold. First of all, as 

E||7?r^||| 2 = (E||7r r X||| 2 - E||7r r X||! a ) + |KC||| 2 + u\M 

and E||7r r Jf||| — E||7r r X||| = 6c t M,a- 3 ,r ^ 0; the problem is of theoretical inter- 
est as our results complement the bound in (1.1) for centered Gaussian matrices 
with a detailed non- asymptotic analysis of the noncentered case, extending also to 
more general rank-r projections. Finite-rank perturbations of random matrices have 
found recently a lot of attention, see Capitainc ct al. (2009), Capitainc ct al. (2012), 
Pizzo ct al. (2012), Tao (2012) among others. Tao (2012), Theorem 1.7, studies the 
the eigenvalue value spectrum of low rank perturbations of an iid complex random 
matrix and proves, as a special case, that 7 max (C+E/(er-\/M)) = 7max(C) + Op(l) as 
M — > 00 and rank(C) = 0{1) as long as |7 max (C)| = 0(1) is sufficiently large, with 
7max(C) the eigenvalue of C which is maximal in absolute value. Capitaine et al. 
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(2009) and Pizzo et al. (2012) study Wigner matrices instead of iid random ma- 
trices. Somewhat remarkably, the outlier eigenvalues of the perturbed matrix are 
not close in probability to those of the original matrix C but to some shifted value 
Ai(C) + a 2 /Xi(C), where a 2 is the common variance of the entries of the Wigner 
matrix, and Aj(C) the eigenvalues of an Hermitian matrix C '. Our results are com- 
plementary: 

• We derive non- asymptotic cumulated second moment bounds on the singular 
values in the deformed (non-Hermitian) iid real Gaussian matrix case, i.e. the 
noise level a 1 and the dimension M are fixed but arbitrary throughout the 
analysis, and the constants involved in our bounds do not depend on them. 

• The perturbation matrix C is not required to be of low or uniformly bounded 
rank, for example, our results cover the case rank(C) = |_-^/2j or rank(C) = 
M. 

• Our proofs differ significantly from the techniques of the above mentioned re- 
sults but rely on empirical process techniques without making use of classical 
random matrix tools. The novelty in the proof of the subsequent Theorem 
5.1 is that a slicing argument is used for bounding the expectation of the 
supremum over some non-centered process. 

Although our results extend without difficulties to the self-adjoint dilation X of 
X in jj 2Mx 2M^ ^ remains open whether, in an appropriate asymptotic sense, the 
eigenvalue spectrum of X behaves similarly to the deformed Wigner case as studied 
by Capitaine et al. (2009) and Pizzo et al. (2012) with finite-rank perturbations, as 
their assumptions do not apply to this setting. 

As concerns applicability in mathematical statistics, the study of E||7r r _X"||| = 
X)I=i E Af arises naturally when infering about quantities like 



(1.6) ||C||| a - |M7||! a or argrnhJ " r f 2 > a \, a € (0, 1], 



Kef 

r >i I \\C\\s 2 



which are of interest for analyzing and understanding the singular value spectrum, 
in particular in high dimension. As above and subsequently, for any matrix C G 
M. xM , its singular values Ai, ..., Am are ordered in decreasing magnitude. In terms 
of singular values, 

M r 

||C||| 2 = £Af and ||7r r C||| a = ]>>2. 

t=l j=l 

If C = J2i=i ^iUiV( denotes some singular value decomposition of C, where 
Ui,...,Um and Vi,...,Vm are two sets of orthonormal vectors in R M , then the 
maximizer 

Argmax E||7r r (C + E)||| = Argmax ||7r r C||| 2 
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is unique if and only if A r > A r +i, in which case it is equal to the orthogonal projec- 
tion X)i=i UiU[ onto the linear space spanned by the orthonormal column vectors 
U\, ..., U r , and 7r r C = X)I=i ^iUiV( . In the context of covariance matrices, the ratio 
|| 7r r C\\ "g /||C||| is often referred to as percentage of the "explained variance" by 
the first r principal components, and the second expression in (1.6) determines the 
smallest number of principal components needed to explain a prescribed percentage 
a of the overall variance. Within our model, the statistics ||7r r X||| 2 — o 2 rM esti- 
mates the expression |J7r r C||| in (1.6) unbiasedly, but note that ir r — 7iy(C) is not 
available in advance as C itself and in particular its singular spaces are unknown. 
Thus, the first question in the analysis is whether the empirical counterpart 

r 

\\n r X\\ 2 S2 - cr 2 rM = ^ A ■ - <J 2 rM 

4=1 

does the job as well, where Ai, ..., A r denote the first r largest singular values of X . 
Our profound analysis about this problem will show that its bias depends strongly 
on the unknown matrix C itself, and even for favorable rank-r matrices C — EA" of 
arbitrarily large amplitude and rank-r projections, the bias E||7f r X||| — E||7r r Af ||| 
remains of the order o~ 2 r(M — r), which is shown to be unimprovable in general. 
Note at this point that E||7f r X||| 2 - ||7r r C||| a > E||7r r X||| 2 - ||7r r C||| 2 = a 2 rM, 
but there is a priori no reason why the difference 

E II^Af||5 2 - e||tiya:||! 2 

cannot be even of substantially smaller order for "good" choices C, see question 
(A). In order to keep the technical expenditure as small as possible, we consider 
the model X — C + E as mentioned above, but we conjecture that similar non- 
asymptotic implications (somewhat different and still to be derived for the Wishart 
case) will be valid for the squared Hilbert-Schmidt norm of rank-r projections of 
high-dimensional empirical covariance matrices YY', with Y ~ A/"(0, C) for some 
positive semidefinite matrix C, with consequences on the robustness properties of 
principal component analysis. 

In the simplest special cases where the question is non-trivial, the main findings of 
the article can be summarized as follows: 

Theorem 1.1 (Prototype of weak accuracy). Let C a = aid for some arbitrary 
real number a£l, where Id denotes the M x M -identity matrix. Then the following 
statements hold true: (i) The case C = C a = aid, o^O, is always worse than the 
case C = 0: 

(ii) The difference 8c a u,cr 2 ,r explodes at least linearly in the amplitude \a\: There 
exists some constant c\ > 0, independent of a 2 , r and M, such that 



S, 



C a ,M,<? 2 ,r 



lira inf — — — '" ,r > c\ o~r\J M — r for all r < M — r. 

|a|->oo a 
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Theorem 1.2 (Universal upper and lower bound in case r = 1). (i) There exists 
some constant C2 > independent of C, M,a 2 , such that for all C £ "$MxM ana i 
M > 2 

(1-7) S c ,m,o*,i < c 2 (o- 2 M + aVM\\C\\s x ), 

where WCWs^ denotes the spectral norm of C . 

(ii) There exist constants C3 > and Mq G N, independent of a 1 , such that 

(1.8) inf 5 CMa 2 1 > c 3 a 2 (M - 1) for all M > M . 

ceR MxM ' ' ' 

Theorem 1.3 (Prototype of high accuracy). Let C a . s — diag(a, , ...,a,0, ..., 0) 
with rank(C a ) = s. Then the following statements hold true: (i) There exists some 
constant C4 > independent of a,r,M and a 2 , such that 

(1.9) Sc a r .M.a 2 ,r < C4 a rM for r < M — r and every a E R. 

(ii) There exists some constant C5 > independent of M , r and a 2 , such that the 
bound (1.1) is asymptotically sharp in the following sense: 

liminf max Sc a s ,M,a 2 ,s ^ c h G r{M — r). 

\a\— foo s£{r,M—r} 

The same result holds true even without the maximum over {1, M— 1} for s = r = 1. 

Theorem 1.3 (i) describes some special case of the more general upper bound in 
Theorem 5.1, which applies for every matrix C G R MxM of rank(C) > r. In case 
that Xi = a for i < r and Xi — j3 < a for i > r, the bound of Theorem 5.1 approaches 
the unimprovable upper bound for matrices C of constant singular value spectrum 
as j3 — » a. 

The article is organized as follows. In Section 2, we introduce the notation and 
describe some basic observations about Sc,M.a 2 .r- Prototypes of matrices C of weak 
accuracy and first lower bounds are studied in Section 3. The supremum over the 
centered differences 

sup (||7r r (C+E)||| a -E||5r r (C + E)||| a - [|MC + E)||! a -E||7r r (C + E)||| 2 

is analyzed in Section 4. The process of centered differences and modifications 
thereof are central for our analysis. Our main results are given in Section 5. Our 
general idea on how to derive potentially sharp upper bounds on <5c,m,<tV f° r § en " 
eral M x M-matrices C is described at the beginning of that section. The upper 
bounds are complemented with lower bounds in Section 6. Consequences on sta- 
tistical estimation problems, in particular in the recent area of low-rank matrix 
recovery, are discussed in Section 7. Section 8 is devoted to the proof of Theorem 
5.1. The proof of Theorem 6.2 is deferred to Section 9. 
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2. Preliminaries. 

2.1. Notation. The notation < means less or equal up to some non-negative 
multiplicative constant which does not depend on the variable parameters in the 
expression. A ~ B should be read as A < B and B < A at once. If not stated 
otherwise, E is a centered Gaussian matrix whose entries are independent with 
variance a 1 . Subsequently, ||.||s , 1 < p < oo, denotes the Schatten-p-norm on 
R MxM , i.e. for any C G R MxM , the ||C|| Sp coincides with the ^,-norm of its 
singular values Ai > A2 > ... > Am: 

/ M ,1/p 

HCIk = E A H forl<p<c^, and \\C\\ Soo = Ai. 

^ i=l ' 

Specifically, || • || g 1 , || • ||g 2 and || • || 5^ are referred to as nuclear norm, Hilbcrt-Schmidt 
or Frobcnius norm, and spectral norm, respectively. tr(C) denotes the trace of 
C e M MxM . For any A e M dxM , A 1 denotes its transpose in R Mxd . Id denotes the 
M x M-idcntity matrix, and Id r the diagonal matrix diag(l, ..., 1,0, ...,0) of rank 
r. As usual, 0(M) describes the orthogonal group, i.e. the group of orthogonal 
M x M-matrices. For any totally-bounded, pseudometric space (X,d) and any 
subset E C X, the covering number N(E,d,8) is the smallest number of closed 
d-balls in X of radius 5 needed to cover E. 

2.2. Some basic observation about 8q m a i r . The following representation clar- 
ifies the problem under consideration. For some arbitrary matrix A £ R MxM , 
ll^llsoo = 1; an d a£K, inspection of the quantity 6 a A.<r 2 .M.r shows 



°aA,M,cr 2 ,r z 


= E||^ r (aA + E)|| 2 c - E||7r r (aA + E)|| 2 

II ^ 7 II 02 I' I' 02 






= El sup < 7r r E „ — 7r r E L, + 2atr( E'(7r r - 

\ _ J- II "62 II II 02 V x 
\7T r eSM,r ^ 


- TT r )AJ 




if II A II 2 

—a 7T r ^4 „ — 

\ 11 11 62 


ll~ a 11 2 \ 



"compensation term" . <0 

The two processes 

(||7r r E||| a - ||7r r E||| a )_ and (atv(E'(n r - 7T r )A))_ 

are centered, while the deterministic compensation term — a 2 (||7r r A|j| 2 — ||7ry.A||| ) 
is less or equal to zero, for any choice of A and every 7r r € Sm.t- Note that the 
stochastic term atr(E'(7r r — ir r )A) is linear in a, but the deterministic compensation 
term depends quadratically on a. This representation suggests that an interplay of 
amplitude |a| and structure of A determine the accuracy of approximation. 
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3. The prototype of weak accuracy — the case without determinis- 
tic compensation term. Recall the definition (1.5), with w r and n r as defined 
in (1.4). The first result is a lower bound on the expected squared Hilbert-Schmidt 
norm of the rank-r-projection in case that the singular value spectrum of C is 
constant. 

PROPOSITION 3.1. Let C a e R MxM w ah singular value decomposition UA a V. 
Assume that A Q = aid with some non-negative number a£l. Then 

(i) Sc a ,M,<T',r > <5o,m,<tV f or ever y a > 0, 
and 

6, 



J C a ,M,v 2 ,r 
a— foo 



(ii) liminf — — — — — > rcryM — r for any r < M — r. 



Remark. Proposition 3.1 (i) demonstrates that the accuracy in case C = old 
is always worse than in case C = 0. (ii) complements this observation with an 
asymptotic lower bound: the difference b~c a .M.<r 2 ,r explodes at least linearly in the 
amplitude a. In particular, (ii) provides a positive answer to question (B) in the 
Introduction. 

PROOF. Y =p Z for two random variables Y and Z means that their distribu- 
tions coincide. Since U'if r U £ Sm,t for & n y TTr S Sm,ti U'~EV =x> E for two fixed 
orthogonal matrices U and V, and ||I7' J 4V'||| 2 = ||j4||| 2 for any A e R MxM , we 
may assume without loss of generality that C = aid. Let 7i> be as given in (1.4), 
i.e. in this case, 7i> denotes some arbitrary but fixed element of Sm,t- First, 

||5r r (aId + E)||^-||7r r (aId+E)||^ = 2atr(E / (Sr r -7r r )) + ||5r r E||| 2 - KE||| 2 , 

i.e. we need a lower bound on 



(sup 2atrl E'(7r r — 7r r ) 



E| sup 2atr{E'(n r -ir r )) + ||ivE||| 2 - |krE||| 2 



Note that the supremum within the expectation is non-negative simply because 
n r e Sm.t- Let E* g K MxM be the matrix with the entries E*- = — E^, i,j = 
1, ..., M . By the symmetry of the Gaussian distribution, 



2atr(E'(Tr r -vr,.) ) + ||7f r E||o - IItivE" 2 



[E'iTTr-TTrjj t ||"r^||s 2 - ||"r^||s 2 

= v 2atr(E*'(^ r - 7r r )) + ||7r r E*||| a - ||7r r E*||| 2 
= -2ati(E'(n r - 7r r )) + ||^ r E||| 2 - ||7r r E||| a . 
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Consequently, 



E sup 2atr(E'(i r -7r r )) + ||7r r E||| - ||7r r E|| ; 

= -E( sup 2atr(E'(iV-7r r )) + ||^E||| 2 - ||7r r E|||_ J 
2 \5 r es„, r v J ' J 

+ W sup 2atrfE*'(^ r - 7r r )) + ||?r r E*||| a - ||7r r E*||| a J 

2 V^SSM.r V J ' J 



1 

2 E 



M" 



sup 2atr(E'(7r r -7r r )) + ||7iyE||| - ||7iyE||| 



sup 2atr(E*'(7r r .-7r r )) + ||7f r E*||| - ||7r r E*|| 
sup 2atr I E' (7r r — 7r r 



. » r 6Sjt r 



|7r r E||| 2 - ||7r r E||| 2 



+ 2atr(E*'(i r -7r r 



7T r E 



r-^ IIS, 



7T r E 



r^ IIS, 



El sup ||SvE||| - HtiyE" 2 



S-2 



. 7TrSSA/,r 

which proves part (i) of the proposition. As concerns the proof of (ii) , observe that 
E( sup 2atr(E / (5r r - 7 r r )) + ||5r r E||| 2 - |Kr-E||f 2 ] 

> 2aE( sup tr(E'(7r r -7r r )) ) - E( sup ||7iyE||| 9 - ||7r r E||| 9 

(3.1) > 2aE( sup ti(E'(TT r -Tr r ))) - a 



\7IYGSa 

- 2 rM, 



since E(sup5j re< s M IKrElH — IKrEHl ) < a 2 rM. By Sudakov's minoration, there 
exists some universal constant cs u d such that 



(3.2) 



E 



sup tr(E'(7iy -7iy)j > a c Sud dy log N(S m , r , ds 2 ,S) 



KrES 



for any <5 > 0. Proposition 8 in Pajor (1998) states for any r < M — r 

/ d \ r(M-r) 

(j) < iv(«s M> r,rfs 2 ,^), ve>o, 

with some universal constant c' > which does not depend on r and M. Choosing 
6 = y/rc'/e and plugging (3.2) into (3.1) yields, for some constant c which does not 
depend on a 2 , r, M and a: 



<5 a id m.o- 2 r > cora\/ M — r — a 2 rM. 
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Dividing both sides by a and taking the limes inferior proves (ii). □ 

In view of the representation in Section 2, the case C = aid is the prototype of 
weak accuracy as there is no deterministic compensation term in the expression of 
the supremum. It follows from the subsequent Corollary 4.2 that the lower bound 
of Proposition 3.1 (ii) is sharp. 

4. S^-^oo-chaining bounds for the supremum over the centered 
process and first consequences on 8c m a- 2 v Let X = C+E as described 
in (1.3). Recall the definition from the Introduction 

■k t G ArgmaxE|| Tr r X||g 2 = Argmax || 7ivC|| s . 

Because of E||7r r (C + E)||| > E||7r r (C + E)||| for every n r g 5jw, r , it follows that 
E sup (pv(C + E)||! 2 -||7r r (C + E)||! 2 ) 

(4.1) < E sup (||7r r (C + E)||| a -E||7r r (C + E)||| a 

n r &S M ,r K 

- "||7rr(C + E)||| a -E||7r r (C + E)||^ 
That is, the study of the supremum over the centered process 
Z := sup C||7r f .(C+E)||| a -E||7f r (C+E)||| a - [||7r r (C+E)||| 2 -E||7r r (C+E)||| ; 

5fr£<SM,r V L - 

yields some first (possibly very rough) estimate of Sc,M,a 2 ,r from above. Variants 
thereof are central for our subsequent analysis in Section 5. 

Proposition 4.1. Let (Ey)^ =1 be a centered matrix of iid Gaussian entries with 
variance a 2 . Then there exists some constant c > such that for every 1 < r < 
Men and every C G R MxM 

(4.2) EZ < c(a 2 rM + o-rVM\\C\\ Soa ). 



The proof of Proposition 4.1 is deferred to the end of this section. We draw some 
first consequences on Sc m a 2 ,r- 

Corollary 4.2. (i) (Universal upper bound) For all C G R MxM , 

<5c,m, ct v ^ o- 2 rM + arVM\\C\\ Soo . 
(ii) (Upper bound in the "small amplitude" regime) 

sup <5c,m,<tV ~ CT 2 rM. 

CeR MxM. 
\\C\\ Soc ,<<?VJ4 
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Proof of the Corollary 4.2 (i) follows from (4.1) and the bound on EZ given in 
Proposition 4.1; (ii) follows from (i). 

Remark. It is worth being mentioned that ~E Z grows linearly with HCHs^, and 
this linear dependence is optimal for matrices X = ald+E, a£l, in view of Propo- 
sition 3.1. In particular, the lower bound of Proposition 3.1 (ii) is sharp, and the 
universal upper bound cannot be improved without further structural assumptions 
on C. 

The next lemma complements the upper bound of the Corollary 4.2 (ii) in the small 
amplitude regime with a lower bound. 

Lemma 4.3. For any real constant k > 0, 

inf $C,M,<r a ,r > <Vm,<tV ~ kWM. 

CeR MxM. 

IICIISoo <kvVM 

PROOF. With the same symmetry argument as used in the proof of Proposition 
3.1 (i), we obtain for any C £ R MxM with \\C\\ Soo < kctVM 

E( sup [|5r r (C + E)|[| - ||7r r (C + E)||J 2 

\5f r 6<SM,r 



> E 



( sup ||7r r E||| 2 - ||7r r E||| 2 + ||5f r C||| 2 - \\n r C\\% a ) 



E[ sup ||7r r E||| a - ||7r r E||| a - K 2 o- 2 rM. 



Remark. Together with the upper Bai-Yin bound in expectation (1.1), Lemma 
4.3 implies in particular for r = 1 that there exists some Mq £ N, independent of 
cr 2 , such that 

inf <5c.A/,cr 2 ,i > o^o,m,<7=i ~ <y 2 M for all M > M . 

CeR MxM : 2 ' ' ' 

||C|| Soo <<x\/M 

That is, in the small amplitude regime ||C'||^ 00 < ctvM, the accuracy is never better 
in order than in the case C = 0, independent of the specific structure of the matrix 
C. 

PROOF OF PROPOSITION 4.1. For any r-dimensional subspace U C R M , let 
Pu £ Sm,t denote the orthogonal projection onto U. The proof is based on the 
classical generic chaining device. In order to make this technique applicable, we 
need to investigate pairwise differences of the centered process Z a ' M,r which is 
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pointwise given by 

Zpf' r := (\\Pu(C + E)||| 2 - E\\P V (C + E)||| a ] 



= (tr(E'P a E) - rAf <r 2 + 2tr(C"P[/E) J , P v e 5 M>r . 

Denote 717. = P^ e Sm,ti Tr = P/ 2 € Sm.t, and A = 7iv — 7iy ■ Recall that 
P^ = P v . and P%. = P Vi for i = 1,2. For any B = (b. l0 )^ =1 € R MxM , vec(P) 
denotes the associated vector obtained by sticking together its columns, 

vec(P) := (fen,--- ,b M i,bi2,"- ,6m2.--- ,^>Mm)' € R M . 

Observe furthermore that 

II^EHI, - ||P/ 2 E||| 2 = tr^P^-P^P^E) 

a/ 
= ^ AjfcE w E H = vec(E)'2vec(E), 

l,k,i=l 

where A denotes the block-diagonal matrix diag(A, ..., A) e R M xM , and, analo- 
gously, 

tr(C"P ai E) - tr(C"P[/ 2 E) = vec(C)'Avec(E). 

Noting that ||^4||s 2 = V^ll^-lls^ and ll^lls=o = ll-^llsoo; Bernstein's inequality for 
quadratic forms of Gaussian variables (see, for instance, Bechar (2009), Lemma 0.2) 
yields the exponential bound 

(4-3) 



2y/^M\\P Ul ~ P U2 1|| 2 + 2a 2 \\Avec(C)\\ 2 2 ■ V~t + 2a 2 \\P Ul - P U2 \\ Sao t 1 < exp(-t) 

for all t > 0. Note that the bound is fully symmetric in U\ and U<x- Since 
||Avec(C)]|l = \\AC\\% < \\A'A\\ Sl \\C'C\\ Sao = P||| 2 ||q|^, it follows that 

(4.4) 



^M\\P Vl -Pu 2 \\ 2 s 2 +2a 2 ||Avec(C)||l < ^a*M + 2a 2 \\C\\ 2 Sgo \\P lh - P U2 \\s 2 , 

i.e. the exponentail tail bound for the differences Zp "' — Zp ' r is characterized 
by an interplay of Hilbert-Schmidt and spectral norm, which take over the roles of 
the £2- and ^ -norms of the classical Bernstein inequality from the vector case. We 
prove first the case r < M — r. Note that 

(4.5) sup IKr -'frllss < 2r and sup ||7iy — 7r r ||Soo — 2. 
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Since Z%' M ' T depends continuously on x, it holds that 
E| sup {ZZf> r -Z%?> r 



E snp (ZZ : 



M.i 



Z 



cr.M.r^ 



for any countable, dense subset S of Sm,t, and by the Theorem of monotone con- 
vergence, it is sufficient to assume subsequently that S is finite. We define now 
recursively an increasing family of partitions (A n ) n >o of S such that Ao = S, and 
for n > 1 and A 6 _4 n 



(4.6) 



(t) ||7r«-4 2 )||s 2 <2-"V2^ and (it) \\4^ 



4 2) lk 



< 2 



-n+l 



y -Kr , 7Tr € A, with „4 n +i C -4 n for all n > 0. This can be realized as follows: For 
n = 0, .4.0,2 := -4o,oc : = {S 1 }. For any totally-bounded, pseudometric space (X, d) 
and any subset E C X, the covering number N(E, d, S) is the smallest number of 
closed d-balls in X of radius 6 needed to cover E. It is proved in Szarek (1982), see 
also Pajor (1998) for a different proof, that for 1 < r < M — r and any C > £ > 



(4.7) N(S M ,r,ds 2 ,ZV?) < (j) and N(S M ,r,d Sa 



f C'\r{M-r) 
^ ^ (j) 



for some universal constant C" > 0. Hence, S can be covered with at most 
( C iy{M-r) s 2 -balh B l>2 ,...,B Nl2>2 of radius y/2r, and with at most (C') r{M ~ r) 
S'oo-balls -Bi.oo, ■■■,Bpf 1 o0!00 of radius 2. From such finite coverings of S 2 - and S<x,- 
balls, the partitions Ai >2 and -4i,oo are canonically constructed by 

-4ij := {(B kij \ (J B u }nS,k=l,...,N ltj }, j = 2,oo. 

l<Kk 

For n > 2 we proceed inductively using the bounds (8.11) and (8.15) for q = 
oo of Lemma 8.2. Indeed, each element A £ A n —i,j is element of an S^-ball in 
Sm,t of radius 2~"\/2r and 2 _n+1 , respectively, and can be partitioned as above 
into (2C) r( - M - r '> subsets of balls of radius 2"(" +1 )^ and 2~ n , respectively. By 
construction, the partitions (*4 ra ,2)n>o and (A n ,oo)n>o are nested, and card(.4 ni 2) < 
D nr(M-r)^ carc ; (^ noo ) < D nr(M- r ) for some universa i CO nstant D > 0. Setting now 

An := {A2 n yloo : Ai e .4.2, n and v4oo G -4„ :00 } yields some partition with the 
above mentioned properties (4.6). Obviously, 



(4.8) 



card(.4„) < card(.4 n! 2)card(.4„ 



For each n > 1 and A G -4„, let s n (A) be some arbitrarily chosen rank-r-projection 
matrix in A. For each s € S and n > 1, there exists some unique A G _4 n with 
S G A, and we set II n (s) = s n (A). When n = 0, define n (s) = 7r r . Now, 



(4.9) 
(4.10) 



E 



sup | Zf' M ' r - ZZ' M ' r 

s£S 



E 



sup 
seS' 



< 



n>0 



E 



sup 

s£S 



l Z n„ +1 ( 



(») 



z n„( s ; 



z n„(,; 
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Note that the decomposition J2n>o ^u ^s) ~ ^n (s) °^ eacn ^. 
(4.9) is finite since S is finite. By construction, 



a,M,r 



7(7, M.r 



sup||n n (s) 

ses 



\ S2 < 2- n V^ and sup||n„( S )- S || Soo < 2~ n+1 , 



ses 



hence ||n„ +1 ( S )-n„( S )||s 2 < 3 -2-"^ and ||n„ +1 (s) -II n ( S )|| Soo <3-2-"+\for 
every s £ S. Note that card({II n +i(s) — LT n (s) : s e S}) < caxd(A n +i)ca,rd(An,), 
n £ N. Applying now Lemma A.l, van der Vaart (1996), to each expectation within 
(4.10), yields 



supUr 

ses V 



M,r 



K r 



M., 



^Y,i; U^^joHl + 2 ( r 2 ||C||| oo y / 21og (N n ) + 2a 2 log (JV n ) 



E 



n>0 

< cr 2 rM + ary/M\\C\\ Soo , 

where A^„ := card(X+i)card(^„) < £>(4ti+2)r(Af-r)_ Notc f urt her that the case 
r = M is obvious, the case M > r > M — r follows by consideration of the 
orthogonal complements \\P V (C + E)\\l = \\C + EB - \\P$(<C + E)|||. 



D 



5. The main result — a general expectation bound for non-centered 
Gaussian matrices. Corollary 4.2 (i) provides some upper bound on <5c.m,<tV 
which is valid for every M x M-matrix C, and which is achieved, for instance, 
for C = and r = 1 in the small amplitude regime, and for C = aid with \a\ 
sufficiently large in the large amplitude regime, i.e. \a\ 3> a\M . In this section, 
we present some new and more refined analysis for bounding 5q Ma 2 r which takes 
advantage of some potentially favorable structure of C - resulting in the presence 
of the deterministic compensation term as explained in Section 2. Some protype 
of matrices of "high accuracy" in the large amplitude regime is discovered and 
analyzed. 

Our approach is motivated and explained in what follows. The conjecture about 
the possibility of improvement over Corollary 4.2 (i) for a certain type of matrices 
follows from the fact that, in contrast to the situation in Section 3, the differences 



(5.1) 



|5rr(C- 



E)IIL 



MC + E) 



S-2 



are usually not centered. With n r = ir r (C) maximizing the expression ||7r r C||| 
over Sm,t here and subsequently, the expectation of (5.1) is less or equal to zero. 
Depending on the structure of C, it may be substantially smaller than zero for some 
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appreciable amount of rank-r-projections. Consequently, for any subset Ac C Sm,: 



(5.2) sup 



(\\n r (C + E)\\ 2 s 2 -hAC + m 2 s 2 ) 



with A_4 C 



< sup ||5r r (C + E)||£ 2 - E||7r r (C + E)||£ a 

TTr&Ac 



|7r r (C + E)||| a -E||7r r (C + E)||| a 



_inf E||7r r (C + E)||^-E||7r r (C + E)||^ 

n r eAc 



A^ c , 



Roughly speaking, the expectation of the supremum in (5.2) is small if the supre- 
mum of the corresponding centered process over Ac is small as compared to A^ c . 
The idea for the general bound is based on decomposing the Grassmann manifold 
along the geometric grid of slices Ac,k, k £ N: 



Ac.k '■= s 7iy £ 5 



>M,r 



\n r C\\ 



s 2 



2 k+i 



< E||7r r (C + E)||| 2 - E||5r P (C + E)||| a < 



hrC\\% 



Define the random variables Y& := sup ( ||7r r (C + E)||g 2 — ||7r r (C + E)||| a ) 



and 

Y k ° = sup (||^.(C+E)|| 2 S2 -E||^(C+E)||| 2 
K r eA c ,k- 



With Z := 



sup 

|5FrC||s 2 =ll' 1 'rC'||s 2 



^ 7r r (C+E)||| 2 -E||7r r (C+E)||| a 
(||i r (C + E)||| 2 - ||7r r (C + E)||| a ), 



we obtain the series expansion 

5 C ,M*>,t = E Sup (||7Tr(C + E)||| a - \\lt r (C + E)||| a ) 

< EZ + ^E(OVFfe) 

fcGN 

(5.3) < EZ + ]TE(ov(n -A^ fc ; 



kef 



In view of this expansion, it is clear that good bounds can be reached in principal 
only if EY® is small as compared to A^ c k . The evaluation of the expectations 



E 



(W^-A^ 



, *GN, 



is however quite a difficult task in general: It requires some suitable characterization 
of the subsets Ac,k of the Grassmannian for general matrices C £ R MxM an d tight 
bounds on their metric entropy. Before discussing this serious issue, we present first 
the main result of this section. 
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Theorem 5.1. Let (Ey )*{ =1 be a centered matrix of independent Gaussian entries 
with variance a 2 . Then for any C G r MxM with rank(C) > r and r < M — r, the 
following bound holds true: 

5c,M,^,r < a 2 rAf(l + min(H,IIl)), 

where, with X± > A 2 > ... > Am denoting the singular values of C , 

, ■ ,'A? , , Al 

1 = mm 



\V oVmj' 

ly-2r \2n1/2 x 

r £-ii=r+l A l \ M 



II r ^ 8 ;7 ■— i=, and 

X 2 J o-VM 



X 2 

III = — r-2 — if A r +i < A r , and III = 00 else. 

X r — A r+1 

In particular for the case r = 1, the bound applies to any O^Ce K MxM . Some 
immediate consequence of Theorem 5.1 is the following. 



Corollary 5.2. Let C a ^p tr 6 R MxM with singular values A.; = a for i < r and 
Xi ~ f3 < a for i > r. As usual, set c/0 := 00 for any c > 0. Then 

8c a>f3 , r ,M,a 2 ,r ^ o- 2 rM I 1 + min f 2 - J I /or e^ery < /3 < a. 

Remark. Corollary (5.2) covers the two extreme cases: 

(i) Prototype of high accuracy in the "large amplitude" -regime a 3> o\[M: 



<5c Q r,A^-a- 2 ,i- ~ o 2 rM for every a > 0. 



(ii) Prototype of weak accuracy in the "large amplitude" -regime a ^> avM: 
the bound of Corrollary 5.2 approaches the unimprovable upper bound for 
matrices C of constant singular value spectrum as (3 — ¥ a, see Proposition 
3.1. 

Note that the upper bound in case C = is covered by Proposition 4.1. 

The proof of Theorem 5.1 is deferred to Section 8. Subsection 8.1 deals with the 
description of the sets Qm,t{5) := {ftr € Sm,t '■ \\^rC\\g — ||5frC||| 2 < 8}, which 
characterize the slices A c ,k = GM,r(^ k \\n r C\\ 2 S2 ) \ £M,r(2~ (fc+1) ||7r r C||! 2 ). It is 
shown that these sets can be approximated by sets of very simple geometric struc- 
ture. In case of some substantial spectral gap in r, this approximation is very tight. 
Sharp bounds on their metric entropy are derived in Subsection 8.2. The final ar- 
guments differ slightly from from the description at the beginning of this Section. 
They are given in Subsection 8.3. 
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6. Lower bounds. The best possible upper bound for the accuracy of aproxi- 
mation in Theorem 5.1 is of the order a 2 rM, which is attained for matrices C = 
or C = ald r in the large amplitude regime. The question arises whether this bound 
is sharp, i.e. whether it indicates some fundamental limit on the accuracy of approx- 
imation. For fixed A £ j^AfxJi/ -^j^ ||-^-||Soo = 1 an d some arbitrary real number 
a £ R, inspection of 8 a A,M,a 2 .r shows 

E||^ r (aA + E)|| o - E||7r r (aA + E)||o 

II v / || £ 2 II v 'II &2 

(6.1) = E| sup ||5frE||g - ||7r r E||g + 2atr(E'(7r r -7r r )-A 

(6.2) -a 3 (||7r r A||* - ||7r r A||" 

y ' \ ii ii 02 n ii 02 

Now, ||7r r C||| 2 — ||7ry-C||s a > since 7r r optimizes the Hilbert-Schmidt norm, and the 
dependence on a in the deterministic compensation term in line (6.2) is quadratic 
while it is only linear in the stochastic part in line (6.1). So, for any fixed a 2 , r, 
M and A, one may wonder whether the accuracy of approximation E||7r r (ald r + 
E)||| 2 — E||7r r (ald r +E)||| tends even to zero as \a\ goes to infinity if A is suitably 
chosen, like, for instance, A — Id r . In this section, we demonstrate that this is not 
the case. We provide some complete proof of the conjecture 

(6-3) inf 8 C ,M,<r 2 ,r Z t>0,M,<r 3 ,r 

in case r = 1. For r > 1 we present some partial solution in the large amplitude 
regime. 

6.1. The universal lower bound for r = 1. 

Theorem 6.1. Let (E^ )^ =1 be a centered matrix of independent Gaussian entries 
with variance a 2 . Then there exists some Mq £ N, independent of a 2 , such that 

(6-4) inf 5 c ,m,c?2,i Z <5o.Af.<r 2 ,i 

cei MxM 

for any M > Mq. 

PROOF. In view of Lemma 4.3 and its subsequent remark, it is sufficient to prove 
that for every /3 > 0, there exists some constant cp > 0, independent of a 2 and M, 
such that 

(6.5) inf > c 8 o- 2 (M-l) for all M > 2. 

CeR MxM. 

\\C\\ Soo >P<yVM 

Let C = Yli=i ^iUiV( denote some singular value decomposition of C, and define 
tx\ := U\U[. Since for any 1 < s < M and ir s , tt s £ Sm,s, k s — tc s = (Id — Tr s )if s — 
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7r s (Id — tt s ) is an orthogonal decomposition and || 7r s (Id — 7r s )||| 2 = ||(Id — 7r s )7r s ||| , 
observe that 



kiCHL - H5riC||| a < lkiC||| a - IkiTiClis. 



' r r " 12 "ttiCIII, - Iki7nc||! : 

2 f\\_ II 2 n~ _ || 2 



= K{hi\\k - IkiTills, 

= AjIKW-TrijTnlli 



x 2 

2 . . A l ||~ „ ||2 



Ai||7ri(Id-7ri)||^= -^Iki-Tnl,^. 



Consequently, 



6c,m,o*,i = E||7ri(a + E)||^ - E|[7ri(C + E)||| a 

= E( sup j IliiEll 2 - ||7riE|| 2 +2tr(E'(?Fi-7ri)C 



niESt. 



\S 2 N"^IIS 2 



ii 2 n ii ? 

I I 02 " " '-'2 



> E( sup | ||^iE|| 2 2 - |7riE||| +2tr(E'(7ri-7ri)C 



-flkr-iill^ 
(6.6) > E( sup <{ IliiEll 2 - llvriEll 2 +2tr(E'(^i-7ri)C 



ti£Sm,iW 



\S 2 W" 1 ^i\S 2 



(6-7) -ylki-ii|| 2 S2 

for any subset 

Sma(S) := {tti G 5 M> i : Iki - jri|| Sa < V2d}, 5 > 0. 

The idea of the proof is to choose 5 — 6(M, er 2 , Ai, (3) in some specific way in order 
to guarantee that the deterministic compensation term in (6.7) is lower bounded 
by — X^S, and to pick afterwards some suitable projection in dependence of C and 
E out of this class which realizes the lower bound in expectation. Since Tf rt u = 
U'n r U e Sm,t f° r an Y K r € Sm,t and U e O(M), Fiu,v = U'EV =v E for any two 
fixed orthogonal matrices U and V, and ||L/'Ay||| 2 = ||^4||| a for any A € R MxM , 
i.e., 

||^ r ([/AT/' + E)|| 2 2 - ||7r r ^(A + E^ y )|| 2 2 , 

we may and do assume subsequently without loss of generality that U = V = Id 
and 7Ti = Idi in particular. With ei, e2, ■•-, &m denoting the canonical basis vectors 
of M. M , every projection matrix in 5m, l can be written as 



M M 
i=l i=\ 


M 
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In order bound Sc.m,<t 2 ,i from below by means of (6.6) - (6.7), define 

(M-l)g» 

for some constant d € (0, min(/3 2 , 1)) to be chosen later. Note that d shall be chosen 
independently of M, a 2 , Ai and C, but is allowed to depend on /3 only. Furthermore, 
§* < 1/2 because Ai > ftoyfM. Now, since 



7Tl j7 Idl 



/ 7? 

7172 

: 

V 7i 7a/ 







0/ 



the constraint 



7ti, 7 €<Sm,i (<***) «*■ ||7ri, 7 -Idi||| 2 < 25,* <S> ||(Id-7ri, 7 )Idi||| 2 < <5„ 
translates into 

M 

((l-7! 2 ) 2 +E^ 2 - (I-TxT+tKI-Ti 2 ) =) 1-7? < ***• 



i=2 



With the choice 



7? := v 7 ! - $** and 7* := sign(E a ) Sii /VM - 1 for » = 2, ..., Af , 

it holds that ||7Ti, 7 * — Idi||| < 2(5**, i.e. 7Ti iT * belongs to Sm.i(3**)- Together with 

5 = £*„ in (6.6) - (6.7), this yields 

(6.8) 



,1 > E( ||7ri >7 *E|[^ a - ||ldiE|| 2 ?2 +2trfE'(i : i i7>f -Idi)C*") - da 2 (M-r] 



C,M,(7 2 , 



We evaluate the expressions within the expectation separately. For any 1 < i,j < 
M, 

M 



(TT hl E).. = ^7i7iEi. 



?=i 
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Since X^i=i it = 1> sig n (Eii) and |Eji| are stochastically independent, and E^-, 
1 < z, j < M, are independent by assumption with E(Ejj) = and Esign(Ey) = 0, 

M / M N 2 



in / IW \ . 

E||7ri l7 .E||| a = £ E (E^^ E «) 



■i3= 
M A/ M 

= EElfYE(^) + E l7r7?|E|E,iE M 

ijti' 

M 

= Ma 2 + E |7*7?|E|E,iEm|. 

U'>2: 



Therefore, 



(6.9) E(||^ 7 ,E|| 2 S2 -||ld 1 E|| 2 S2 ) > 0. 
Next, we decompose 

(6.10) 2tr(E'? 1>7 ,c) = 2tr(E'^ 7 JdiC) + 2tr(E'7fi )7 *(Id - Idi)C 

In order to check Etr(E'7Ti !7 *(Id — Idi)C) = 0, it is sufficient to notice that all 
entries of the matrix E'7Ti j7 *(Id — Idi) have expectation 0. Indeed, its first column 
is equal to zero, and one easily verifies for the remaining indices 1 < i < M, 
2<j<M that E(E H 7*7,*) = for every 1 < I < M, hence 

M 

(E(E , 7ri, r (Id-Idi))).. - E E ( Em %*7*) = 0. 
lJ i=i 

Together with (6.9) and IdiC = Aildi, (6.8) reduces to 



dc,M,&- 



A > E(2trrE'(5f l!7i -Idi)IdiC) - da 2 (M - 1) ] 
= EpAitrfE'^i^Jdi") - d(T 2 {M-l)\ 

= a-2\ l ^Lf(^ ll /cy) + Y, 1 * ll :{^ l /a)\ - da 2 (M-l) 

M 

= (l-^,) 1/a *i* a (^-l)~ 1/2 ^2Ai53 E |Eii/H - do 2 {M-l) 



i=2 



> Vdcr 2 (M-l)-^= - da 2 (M-l). 

Choosing now some d <E (0, min(/3 2 , 1)) such that 2v / rf/v / 27r — d > proves (6.5). □ 

Remark. Theorem 6.1 answers to question (A) from the Introduction in the neg- 
ative. 
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6.2. A partial solution to the conjecture (6.3) for r > 1 in the large ampli- 
tude regime. The specific construction of the (random) projection 7Ti l7 * in the 
proof from the previous paragraph cannot canonically be extended to arbitrary 
r > 1. For the result in this subsection, we use finally a different approach based 
on abstract lower bounds on suprema of Gaussian processes which applies to any 
r < M — r. For the prototype C a — ald r of high accuracy in the large amplitude 
regime \a\ S> u\M of Theorem 5.1, the following result is deduced. Its substantially 
more involved extension to a non-asymptotic optimal lower bound for general ma- 
trices C may also involve the Sudakov-type minoration for Gaussian chaos processes 
(Talagrand (1992)). 

Theorem 6.2. Let (Ey)^C- =1 be a centered matrix of independent Gaussian entries 
with variance a 2 . Let C ayS £ I BxM with singular value decomposition Uk a:S V' , 
where A a _ s — ald s with < a £ R and 1 < s < M . Then 

(6.11) liminf max Sq m a 2 s > a r(M — r). 

The proof of Theorem 6.2 is deferred to Section 9. 

Remark. In view of the polar decomposition of (Id — H r }it r and 7r r (Id — 7i>) which 
shows in particular that these two matrices have the same singular values, we 
conjecture that the bound holds true also without the maximum over s £ {r, M—r}, 
but do not have a rigorous and elegant proof for it yet. Note that this maximum is 
redundant if r = M/2, M £ 2N. 

7. Consequences on statistical estimation problems. LctX = C + E 

as described in (1.3). Let Ai > A2 > ... > Am and Ai > A2 > ... > Am denote 
the singular values of X and C, respectively. Recall that Y^i=i^f = ll^r-X"||| 
and Y^i=i ^i = IKfCHs with the rank-r-projections 7? r and ir r as defined in the 
Introduction. 

7.1. The largest singular value. We begin with the simplest example of esti- 
mating A^, the largest eigenvalue of C'C, based on the observation X = C + E. As 
explained in the introduction, the maximal eigenvalue of X' X is positively biased 
as an estimator for Af , because 

EA^ = E||^iX||| a > E||ttiA:||! 2 = A^ + a 2 M. 

Therefore, one immediate improvement is to consider s := A^ — a 2 M as an estimator 
for X 2 . As Theorem 6.1 reveals for the particular case r = 1, 

(7.1) Es-A^ = E\j-a 2 M - \\ 

is stricly positive and bounded away from zero, uniformly over C £ R MxM . As a 
consequence of Corollary 4.2 and Theorem 6.1, 

fc,M,,M = Vs-\l £ \c 1 a 2 M,c 2 (a 2 M + a^M\\C\\ Sx ) 
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for some universal real constants c\ , C2 > which do not depend on M , a 2 and C, 
and it follows from (1.2) for C — and Proposition 3.1 (ii) for C = aid that these 
bounds cannot be improved in general. Hence, one message of our analysis is: 

The quantity a 2 M always underestimates the bias E(A 2 — A 2 ) by at least 
some universal factor strictly larger than 1, independently on how favorable 
the matrix C is. 

In other words, even after correction by tr 2 M, the difference 

(7.2) EA 2 - \\-a 2 M 

remains of the same order a 2 M at least, independently of C. Moreover, there exist 
matrices C for which (7.1) is not smaller in order than a 2 M+a\/M\\C\\s ac - That is, 
large amplitude IJCIJs^ never improves (in order) the acuracy as compared to C — 
0, but it may result in substantially worse accuracy of approximation. Therefore, 
some further consequence is that small magnitude of a 2 M is necessary but far 
from being sufficient for the bias of s in (7.1) to be small. The worst case error 
is non-asymptotically quantified in terms of HCHs^cr and M in Corollary 4.2 
(i). Theorem 5.1 describes more precisely the effect of the shape of the singular 
value spectrum on the accuracy of approximation. For example, if C = aid, then 
(7.2) grows like |a|o"vM as |a| — > oo, cf. Proposition 3.1 (ii), for any fixed a 2 
and M. On the other hand, if C = aldi, then (7.2) remains bounded by some 
universal constant times a 2 M, independently of a. The same holds true for the full 
rank matrix C — aldi + aid. Consequently, Theorem 5.1 demonstrates that large 
amplitude HCHs^ does not necessarily result in worse accuracy of approximation 
as compared to the case C — 0, and discovers some prototypes of high accuracy in 
the large amplitude regime. Similar conclusions for r > 1, i.e. statistics of the form 
Xa=i ^f > are van d as well. 

7.2. Relative quantities. This subsection is devoted to the consequences of our 
results on relative quantities as described in the introduction. Consider, for instance, 
the ratio 

VrC\\% E[ =1 A 2 






C Wk E£x a? 



Assuming ||C||c to be known, some natural candidate estimator of t r is 



\\n r X\\ 2 o - a 2 rM 



|cp 



s 2 



In this situation, increasing amplitude of C always results in smaller bias of the 
estimator t r . Suppose that C — a A for some real number a£l and an M x M- 
matrix A with \\A\\ Sao = 1. Note that ||A||| 2 > \\A\\ 2 Stx = 1. Then Corollary 4.2 (i) 
yields 



rank(A)>r, ||A|| S 



,~ x ^ o- 2 rM [ \a\ 1 

sup E(t r -t r ) < — <M+-U=^ 

R MxM. a 1 \ a\JM) 
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For every fixed A, a 2 , r and M, the bias E(t r — t r ) tends to zero as |a| — > oo, in 
contrast to the situation for the absolute difference in the previous Subsection 7.1. 
Note that it decreases of the order |a| _1 at least and of the order a~ 2 at most, 
depending on the shape of the singular value spectrum of A, cf. Theorem 5.1. So, 
whereas, for every fixed A, <r 2 , r and M, the absolute difference as described in 
Subsection 7.1 for r = 1 cannot get closer to zero as the amplitude \a\ of C = 
a A increases, independently on how favorable the matrix C may be, the relative 
difference E(t r — t r ) tends to zero as the amplitude goes to infinity for every matrix 
C = a A. The shape of the singular value spectrum however clearly influences the 
accuracy of approximation, in the same fashion as explained in Subsection 7.1, cf. 
Theorem 5.1. 

7.3. Quadratic junctionals of low-rank matrices. One natural candidate for 
estimating ||C||| , based on the observation X = C + E described by (1.3), is the 
unbiased estimator ||X||| — a 2 M 2 . Simple calculation yields 



(7.3) Var(||X||! 2 -CT 2 Af 2 ) = 2a A M 2 + Aa 2 \\C\\ 



One disadvantage of this estimator is its large variance for large values of M: it 
depends quadratically on the dimension. If r = rank(C) < M, then the matrix C 
can be fully characterized by (2M — r)r parameters as can be seen by the singular 
value decomposition. That is, if r <C M, the intrinsic dimension of the problem 
is of the order rM rather than M . Now observe that for every matrix C with 
rank(C) = r, 

licill, = hrC\\%. 

Elementary calculation reveals that ||7r r JT||g — a 2 rM unbiasedly estimates ||C||| 2 , 
and 

(7.4) Vsn(\\7r r X\\ 2 s -<j 2 rM) = 2a 4 rM + 4a 2 \\C\\ 2 



As compared to (7.3), its variance does not depend on the squared dimension M 2 
but grows like rM, which can be substantially smaller. Moreover, 

e(||tivX ||§ 2 - <r 2 rM - \\C\\ 2 S2 - 2o-tr(E'C)) = 2a 4 rM, 

that is, cr -1 (||7r r ..X'|| 2 j — G 2 rM — ||C||| 2 ) is approximately centered normal with 
variance 4||C||| 2 if o 2 rM = o(l) in an asymptotic framework, and 411(71112 is 
the asymptotic efficiency lower bound (Laurent and Massart (2000)). The statistics 
||7r r X||| 2 — a 2 rM , however, cannot be used for estimating ||C||| since ir r = ir r (C) 
depends on C itself and is unknown a priori. Unfortunately, Theorem 6.1 and The- 
orem 6.2 demonstrate that the same result cannot be shown with ||7r r X||| 2 — a 2 rM 
in place of ||7iyX||| — a 2 rM, because the bias E||7? r X||| — E||7r r X||| is of the 
order not smaller than a 2 r(M — r), i.e. 



(E||£ r xn 2 2 -EiKxny 
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is not negligible under the same conditions, even not for very favorable matrices C = 
ald r . That is, empirical low-rank projections ||7r r .X"||g — a 2 rM cannot be succes- 
sively used for efficient estimation of ||C||| , even if the rank(C) <C M is explicitly 
known beforehand. Note that in contrast, empirical low-rank approximations have 
been proved useful when estimating a low-rank matrix C under Hilbert-Schmidt 
norm loss, see Bunea et al. (2011), Koltchinskii (2011), Ncgahban and Wainwright 
(2011), and Rohdc and Tsybakov (2011). 

The problem of quadratic functional estimation in the matrix context appears, for 
instance, in the recent area of low-rank matrix recovery, when one is interested 
in recovering the linear entropy 1 — ||C||| of a quantum density matrix C as an 
approximation of von Neumann entropy based on noisy observations. We refer the 
reader to Artilcs ct al. (2004) for a detailed description of applications in quantum 
state tomography and the recent article of Koltchinskii (2011) for low-rank matrix 
recovery of quantum density matrices. Note however that our results do not take 
into account that a quantum density matrix C is self-adjoint, and the Wigner 
ensemble may behave differently as already outlined in the introduction. In view 
of model selection issues, an estimate of the bias is even required over the whole 
scale r£ {1, ...,M} since the rank is typically unknown a priori and low at most 
approximately, for which reason exact asymptotic results for uniformly bounded 
rank perturbations are of limited value for this application. 

8. Proof of Theorem 5.1. This section is devoted to the proof of our main 
result. Subsection 8.1 deals with the description of the sets GM,r(8) '■= {ftr G $M,r '■ 
\WrC\\% — \\K r C\\g < 8}, which characterize the slices Ac,k = ^/,r(2~ fe ||7r r C||| 2 )\ 
^M,r(2 _ ' fe+1 ' ) ||7r r C|j| 2 ). It is shown that these sets can be approximated by sets 
of simple geometric structure. In case of some substantial spectral gap in r, this 
approximation is very tight. Sharp bounds on their metric entropy are derived in 
Subsection 8.2. The final arguments are given in Subsection 8.3. 

8.1. Characterizing the sets GM,r($)- The first goal for a sophisticated analysis 
is to characterize the sets 



>0. 



QmA$,C) ~ {*r G S M ,r : ||7r r C||i a - ||5f r C||| 2 < &}, 8 

Note that 

Ac.k = eM,,.(2- fc ||7r r q|| 2 )\^ r .(2-( fe + 1 '|| 7 r r C||| 2 ). 

This is a very delicate part and quite involved in general, but we find below a 
tight characterization in case of approximate rank-r-matrices, i.e. those matrices 
for which ||(Id — K r )C\\s 2 is small. For reasons of clarity, we restrict attention to 
matrices of rank larger or equal to r. 

oMxM 



Proposition 8.1. Let C G R , rank(C) > r, with singular values 

r+l ' 



Xi, i = 1,...,M, ordered in decreasing magnitude. Denote A* := YliL r +i *i 
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and 7* := (A~ — Aj? +1 ) ' if A,- > A r +i, and 7* = 00 else. Then for any 

1 2 
\S 2 > 



n r £ Argmax^gs lkrC||| , 



•1) Gu,r(S, C) C< ir r £ S M ,r ■ |K r - 7r r ||s 2 < 



VV^ + A;),^! 



and 

(8.2) {i r G 5 M ,r : |kr-7r r || ft < A^V^j} C Q M , r {6,G). 

PROOF. Let UAV' denote some singular value decomposition of C, with 

A r . := diag(Ai, A2, ..., A r ,0, ...,0) and Aj\/_ r := A — A,.. 

Recall that ir r :— X)I=i QUI ^ s some maximizer of |krC|ls over n r £ Sm.t, where 
the Ui's denote the column vectors of U. Note at this point that due to multiplicities 
in the singular value spectrum, the orthonormal vectors U±, ..., U r are not unique 
in general. As concerns the proof of (8.1), we check first that 

(8.3) lkrC||| 2 - ||7TrC||| 2 > \j - \\ 7T r - 7T r || | 2 - E r 

for all TT r £ Sm,d where E r := \\7r r UAM- r U'\\g — ||(Id — 7r r )7T r (Id — 
~K r )U h.M-rU'\\ 2 s . Using the symmetry of the projection matrices ir r and lx r and 
the invariance of the trace operator under cyclic permutation, we obtain the iden- 
tity 

(8.4) |k r C||! a - |k r C||| 2 

= tr(c"(7r> r -??;?r r )c) 

= tr(cC(7r r -7r r )) 

= tr(AlU'(ir r - TT r n r n r )u] - tx(A 2 M _ r U'(Id - n r )n r {ld - n r )U\ 

r 

(8.5) = J2 x2 i[ U '^r(ld-n r )n r )U].. 



1=1 

M 



(8.6) - Y^ A?[?7 / (Id-7r r )7r r (Id-7r r )i7]. 



2=r+l 



Note that the sum in (8.6) equals S r . Since ir r — ir r Tr r ir r = 7r r (Id — 7r,.)7r r = 7r^,(Id — 
TT r )Tt r is positive semidefinite, all summands in the first term of (8.5) are non- 
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negative. Consequently, 

r 

(8.5) > X 2 r Y J [ U '^r-^r^r)U] ii 
i=\ 
M 



.7) = A?53[l7'(7r r -7r r 7r r 7r r )l7] i . 



= X r tT\Tr r — TT r TT r Tr r ) , 

where (8.7) follows from \U'(jr r — 7r r 7r r 7r r )C/] .. = for i > r. By positive semidefi- 
niteness again and symmetry, the eigenvalue decomposition of 7r r — Tr r Tr r iT r and the 
invariance of the trace operator under basis transformation yield tr(7r r — 7r r 7r r 7r r ) = 
||7r r — 7r r 7r r 7r r ||s'i, hence 

||7r. r C|| s . 2 - \\^rC'\\' s . 2 > X r \\ir r — TTrTTrTTrllsi ~ 3 r . 

Now, since n r = ir r TT r + 7iy(Id — 7T r ) is an orthogonal decomposition, 

lk r 7f r ||| 2 + ||7r r (Id-7r r )||| 2 = ||7r r ||| 2 =r= ||i r ||| 2 = ||7r r 7r r ||| a + ||(Id - w r )w r \\s 3 , 

implying that ||7r r (Id — 7Tr)||s = II (Id — 7iy)?Jr|ls • Consequently, 

(8.8) \\TT r — TT r TT r TT r \\s 1 — tr(^TT r ~ TT r TT r Tr r ^j = tr^TiV — ir' r ir' r ir r ir r ^ 

= lkr||| 2 - |kr7Tr||| 2 

= ||7r r (Id-7r r )||| a = ^IK-^rlll,, 

where the last equality follows by the orthogonality of the decomposition 7f r — ir r = 
(Id — 7r r )7r. r — 7r r (Id — TT r ). This implies (8.3). In order to deduce the bound in (8.1), 
note first that because of the positive semidefiniteness of (Id — 7r r )7r r (Id — 7iy), also 
all summands in the second sum (8.6) are non-negative, whence 

M M 

J2 A 4 2 [(7(Id-^)i r (Id-^)[/'] 2i < X 2 r+1 Y, [U(Id~ir r )n r (te-n r )U'] vi . 

i—r+l i—r+1 

With the same arguments as provided above for n r — 7r r 7r r 7r r — 7r r (Id — 7r r )7r r , we 
deduce 

(8.9) S r < A? +1 i||(Id-7r I ,)-(Id-i r )||| 2 = A? +1 i||7r r -SF r ||! a . 
Moreover, because U'TT r U £ Sm,t for every U € O(M), 

(8.10) E r = \\n r UA M -rU'\\ 2 S2 = \\U'n r UA M -r\\s 2 < J2 X ^ 



i=r+l 



and claim (8.1) follows from (8.3), together with (8.9) and (8.10). The proof of (8.2) 
uses that the expression (8.5) is in turn upper bounded by X\^-K r — , K r TT r 'K r ^s l while 



26 A. ROHDE 

(8.6) is less or equal to zero, and concludes finally with the same equality chain 

(8.8). □ 

We note that in the particular case of rank-r-matrices, 2~Zi= r +i ^* = 0> i- e - the nrs ^ 
term in the upper bound on the radius of the SVball in (8.1) coincides up to the 
ratio Ai/A r with the lower bound in (8.2). Equality holds for rank-r-matrices with 
rectangular singular value spectrum Ai = A2 = • • • = A r . If C = aid for some 
a^O, then A* = o?r and the inclusion (8.1) is trivial: 

< iT r S Sm,t ■ IKr — ^r||s 2 < a~ 1 y/2(S + a 2 r) > = Sm^, for any 5 > 0. 

This is in accordance with the fact that in this case, also GM.r(S) = Sm.7- for any 
5>Q. 



8.2. Metric entropy bounds. The nice feature of the results from the previous 
paragraph is that they enable us to determine tight bounds on the metric entropy 
on these particular subsets of the Grassmannian by the volumetric argument. We 
provide a slightly refined version. Recall at this point the definition of the covering 
numbers. For any totally-bounded, pseudometric space {X,d) and any subset E C 
X, the covering number N(E, d, S) is the smallest number of closed d-balls in X of 
radius 5 needed to cover E. 

Lemma 8.2. For any n r € 5jw,r an d S > 0, let B$ (7iy, S) denote the closed S q -ball 
with center ir r of radius S. Then there exist universal constants c, C, c > such that 
for allO < A <r, r < M -r and < S < A, 

(8.11) logN(Bs 2 (Tr r ,A)nS M ,r,ds 2 ,6) <mm(crM^-,r(M-r)log(^-j 

(8.12) logN(Bs 2 (TT r ,A)nS M , r ,d Soo ,d^ <mm(crM-^,r(M-r)log(^ 

as well as 

(8.13) log N (b S2 (7r r , A) nS M , r ,ds 2 ,S) > r(M-r)log(^\ and 

(8.14) logN(Bs 2 (n r ,A)nSM,r,d Soo ,s) > r(M - r) log (-^\ 

REMARK, (i) c 1 ' 2 equals cs u d times a uniform (in M) bound on the expected 
spectral norm of E/(vMcr), where cs u d is the universal constant of Sudakov's mi- 
noration which is bounded by 6 (see Ledoux (1996)). C is proportional to the 
ratio of the constants C jd which appear in the bounds of the metric entropy of 
the Grassmann manifold, see the proof below. Estimates on their values are not 
provided in Szarek (1982). 
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(ii) For arbitrary 1 < q < oo, the bound 

(8.15) 



r(M-r) log (^\ < log N(B Sq (-K r , A) nS M , r ,d Sq ,s) < r(M-r)log(^\ 

can be proved completely analogously, replacing below 52 by S q . 

PROOF. By the geometric formulation of Sudakov's minoration, the trace duality 
and the Cauchy-Schwarz inequaliy, 



8JlogN(B S2 (ir r ,A)nS M ,rds 2 ,6) < E sup tr(T(E/<r)) 

V V ' TeSs 2 (7r r ,A)nS M ,r 

<E sup tr((T-7r r )(E/cr)) +Etr(vr r (E/CT)) 

TeBs 2 (n r A)ns M , r 

< sup \\T-ir r \\sMfy<r\\s a o 

TGBs 2 (7r r ,A)nS M ,r 

< AVrM, 

which provides the first estimate in the minimum of (8.11). In order to prove the 
second term, note first that iV(i?s 2 (7i>, A) P, SM,r,ds 2 ,S) does not depend on the 
specific 7ty £ Sm.t- Similarly to the covering number, the capacity number D(E, d, S) 
is the largest number of elements of E having distance d strictly larger than 5 to 
each other. Using the relations between covering and packing (capacity) numbers 
(cf. Theorem 1.2.1, Dudley (1999)), 

N(B S2 (x r ,A)nSM,r,ds 2 ,6^N(S M ,r,ds 2 AA) 

< D(Bs 2 {Tr r ,A)nS M ,r,ds 2 ,8)D(SM,r,ds 2 AA)- 



Let {iTr \ • ■ • , Tr } be some maximal subset of Sm ,r with ds 2 (717 , 7iy ) > 4A for 
all I ^ me {l,...,fc 4 A}, ^4A = D(S M ,r,ds 2 , 4A). Then 

&4A 

J2D(BsM J \A)nS Mtr ,ds 2 ,8) < D(S M ,r,d S2 ,S/2) < N(S M , r ,ds 2 ,S/4), 

3=1 



that is 



N(Bs 2 (K r ,A)nS M ,r,ds 2 ,5) < N(S M , r ,d S2 ,5/A) / ' N{S M ,r,d s „±A). 

Now (8.11) follows by an immediate application of Proposition 8, Pajor (1998), 
which states 



/r '\r(M-r) , ,n' 

(|) < N(S M ,r,d Sq ,Sr^) < (^ 



for 1 < q < oo and universal constants c',C > 0. As concerns bound (8.12), first 

observe that 

(8.16) 

N(Bs 2 (nr,A)nSM,r,d Soo ,5^ < N(B S2 (7r r ,A)nS M ,r,ds 2 , 5/$JN(Bs 2 (0,l),d Soo , 6 
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for any > 0. Combining (8.16) with (8.11) and the bound 
log N(Bs 2 (0, 1), d Sao ,S) < Mb- 2 (cf. Pajor 1998, Lemma 4), we obtain 

logN(Bs 2 (7r r ,A)nSM,r,d Soo ,S^ < rM^- + Me~ 2 . 

Choosing 8 2 — 5A~ 1 r~ 1 / 2 gives the first term in the minimum on the RHS of (8.12). 
The proof of the second bound in (8.12) follows from (8.11) since ds^ < ds 2 - As 
concerns the reverse inequality (8.13), it is sufficient to note that 



D(SM,r,ds 2 ,A)N[Bs 2 (irr,A)nS M ,r,ds 2 ,6) > N(S M , r ,ds 2 ,S) 

which after applying the inequalities of Theorem 1.2.1 in Dudley (1999) again is 
lower bounded by 

m(u < MnC w x\ ^ N(S M , r ,ds 2 ,S) 
N(B S2 (n r ,A)nS M , r ,ds 2 ,6) > ^^^ A/2) > 

and the result follows as above by an application of Proposition 8, Pajor (1998). 
(8.14) follows analogously from ds^, > (2r)~ 1 / 2 ds 2 on Sm,t- D 

8.3. Slicing the Grassmann manifold. As has been seen in Section 6, the bound 
involves some term of the order a 2 r(M — r) at least. Since Proposition 4.1 yields 
in case C = the bound <5o,j\/,erV ~ ^rM, we decompose the supremum 

^sup (|[vf r (C + E)||g 2 - ||7rr(tf + E)||J 

ir r £S M ,r 

= sup < ||7f,.E|| s — ||7r r E|| s + 2tr(E'(7r r — ir r )C 
7r r eS M _ L 



(n ii 2 ii ii 7 

7T r L/ „ — 7T r O „ 
I 1 1 02 II II O? 



f II ~ II 2 II II 2 1 

< sup < 7r r E — k r E L } 

- 1.11 II S a II USaJ 



7r r £t?M,r 

+ sup I 2tr(E' (jf r - n^c") -( \\7T r C \\ 2 S2 - \\ir r C\\ 2 S2 
and treat these two suprema separately. Since 

- 2 rM 



E( sup {||7r r E||g -||7r r E|L|] < a 



Tr£5M,7- 

by Proposition 4.1 applied to the situation C = 0, it is sufficient to prove that the 
expectation of 

W := _ sup J2tr(E / (i r .-7r r )c) - ( || ir r C \\ 2 - ||^rC||^ 
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satisfies the bound of Theorem 5.1. For any C <E M MxM , rank(C) > r, let 
C = UKV denote some singular value decomposition, with diagonal matrix 
A = diag(Ai, A2, ..., Aa/). The singular values are assumed to be ordered in de- 
creasing magnitude. Let 

SmA 5 ) : = Y*t € S M ,r ■ llTTr - 7T r ||g a < V25J. 

In view of the inclusion provided by Proposition 8.1, we deviate slightly from the 
description at the beginning of Section 5 and conduct the proof of Theorem 5.1 
along two different decompositions of Sm,t- We shall decompose Sm.t into slices 

C c ,i(A k ) = Bc,i(Ak)\Bo,i(Ak-i) 

along a geometric grid Afc = 2~ k+2 r up to k < feo.i with ko.i specified below. First, 
we take 

(8.17) B c ,i(A fc ) := {i r £% r : lk r C||| a - ||7r r C||| a > A fc A? - ^ A 2 }. 

i— r+1 

In a second step, whenever A r > A r+ i, we choose 

(8.18) BcA^k) ~ [*r G S M ,r : lk r C||| 2 - ||?r r C||| a > (A? - A 2 +1 )A fc }. 

By Proposition 8.1, Cc,i(Ak) c <Sjif,r(2Afc). Recall that by construction, ||7r r C||| — 

lkrC||| 2 > A fc A 2 - Y.Zr+i X l if ^ € Cc,i(A fe ). Whenever A r+ i > 0, define 
(8.19) 



i=r+l J 



fco 1 := argmax < A^A^ — > A^ > -AfeA r > , and set fco 1 := 00 if A r+ i = 0. 
feeNo I ' -1 

Define fco.2 := 00. Denoting 



s 2 



W k ,i ■= sup__ l2tr(E'(n r -TT r )c) - (\\ir r C\\ 2 S2 - \\n r C"' 2 

W% ti := sup^ 2trfE'(i r -7r r .)C* 

s r ec c .,(A fc ) 

and 

W kol := sup__ 2tr(E'(7r. r -7r r )C 

7f r e5M,r(A fcol ) 

with Aqo := 0, we obtain the expansions 

fE fc < fe0 , 1 E(0V(W° 1 -O M )) + EW k0A 
(8.20) EVF < I 

lE feeN E(ov(w fc ° 2 -o fe)2 ; 

where O fei ^= A fc A, 2 - Eil r +i A ? and ^M = (A, 2 - A 2 +1 )A fc . Note that EW^ = 0, 
that is, EWfe ! = if A r+ i = 0. 
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PROOF OF Theorem 5.1. Note that each W^ t is the supremum over some 
Gaussian process, and 



sup^ Varl trfE'(7r r -7r r )CJ I < sup _ cr 2 Aj||7r r -7r I .||g 2 < 2a 2 X 2 A k =: a 2 k , 

where the last equality follows from the inclusion Cc,i(Ajfe) C 5M,r(2Afc). By Lemma 
8.2 and ^-chaining over SM,r{2A k ), 

E(OV^) < /" ( TA 1 A^ /2 (21og(7V(5 A/ , r (A fe ),d S2 ,2A^ /2 ( 5))) 1/2 d ( 5 



k 

2 



I.e., there exists some constant c > 0, independent of M, r, a and C, such that 
(8.21) E(OVW^) < caX^y/riM-r), 

and by the Borell (1975) - Sudakov and Circl'son (1974) inequality, 

(8.22) 

p(V fe °, > caXiAl /2 y/r(M^r) + v^aAiA^ 2 -^ J < exp(-? ? ) for any 77 > 0. 
(The case i = 1) With the help of (8.22) we evaluate the first term 

]T e^vO^-jv)) < Yl E(0V(W° 1 -iA,A 2 

fc<fc ,l fc<fco,l 

•1/2 



in expansion (8.20). For the ease of notation, define Aft := ccrAiA fc y/r(M — r). 

— 1/2 
Since A& depends only linearly on A fc ' , while 

IKrC||| a - ||7rrC||| a > -A fc A 2 for all n r e C c (A fc ) and fe < fc ,i, 
define the additional auxiliary integer 

fc^ := arg max \ -A k X 2 r - A k > -jA k X 2 r \ , 

and set fc^ = if the relation never holds. If k\ = 0, then 



VrX 2 r < Aca\ lx /r{M -r), 
and the bound 



E sup f\\n r (C + m 2 s 2 - hr(C + E)\\ 2 S2 ) < a 2 rM(l + ^= 
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follows immediately by Corollary 4.2 (i). Thus, we assume fe* > 1. We first treat 
the case k < k*[ . By the representation formula for the expectation of non- negative 
random variables and the definition of k\ , 



E 



(0 V «1 " l A ^r)) = l°° P ( W °.l ~ l A ^r > U ) du 



DC- 



: / PlZ%-A k >u+^A k \l)<\u 



for all k < k{. Next, by (8.22) and with A k = {2a k )- 1 \A k X 2 r 



It ,o\ . . f°° ( {u+\A k \ 



Pi [ Wl 1 >u+-A k Xljdu < J^ expl-^ ±-£ rJ 1 du 



,2^ 



(8-23) < _^_exp(-4/2 

(8.24) < 



(2 + 40(1 + ^/2)' 



where we used V{N > x) < (2 + a;)" 1 exp(-x 2 /2) for N ~ W(0, 1) in (8.23) and 
the inequality exp(-x) < 1/(1 + x) Vx > in (8.24). Thus 

(8.25) £E(0V(W fe VA fe -~A*A*)) < E (l "" 






+ A 2 k /2 



We evaluate (8.25). Recall a k = \/2a\iA k . Then, bounding the sum by an intgral 



and by change of variables 



a k „ v^ /^ ■> X1/2A , A fc A 



l™^** 1 



' 4 */ 2 " feS V 128<^A?A 



fc<fcj K/ feeN 



v2\4 



fc V 



H 



A 



4 \ -1 



< crAi / [1 + x 2 —^ ) dx 

JO V CT 2 X 1 , 



\2 /-oo 

< a 2 ^ / (1 + arTMs. 



^ ZL I , 1 .2,- 1. 

A r Jo 

In order to estimate the expression X)fc>fc*+i E(0 V (W^j — (l/2)AfcA 2 )) we need 
to determine a lower bound on A:^ in dependence of Ai, A,-. By its definition, k > k* 
implies 

(8.26) caAiA^VK^-0 > A fc A^/4 

as long as k\ < fe ,i. Recalling by (8.21) that E(0 V Wg >x ) < A fc , and using (8.26) 
and the representation for the tail of the geometric series, 

feo.i— 1 .2 

£ B(0VW£0 S E Af ^A 1V ^M^ < - 2 ^ 
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As concerns the second term EWt 1 in expansion (8.20), we obtain by definition 
(8.19) of fco.i) if ^0,1 < 00; 

VW k0A < A ko>1 aX^r(M-r) < ^Syi^A _ aXir ^ (M _ r) . 

If koi — oo, then rank(C) = r and the rank-r-projection 7iy is unique, i.e. W ka x = 0. 
Collecting things together, this proves the bound 

(8.27) 6c,M,o*,r <<J 2 rM(l + ll). 

(The case i = 2) We assume subsequently that A r > A r +i, because otherwise 
III = oo and the result follows with (8.27). We proceed similar to the case i = 1 
above, but with fco,2 := oo and the auxiliary integer 

k* 2 := argmax |(A 2 - X 2 r+1 )A k - A k > i(A 2 - X 2 +1 )A k J, 

where fc 2 := if this relation never holds. The sum 

£ E(0V« 2 -O fe . 2 )) < ]>>(ov(< 2 -(A 2 -A 2 +1 )A fc ) 
fc<fc .2 fceN 

can be treated analogously to the case i — 1, with A 2 — A 2 +1 in place of A 2 . Similarly 
to (8.26), k > fc 2 implies 

arAiA£ /s Vr(M - r) > A fc (A 2 - X 2 r+1 )/2. 

Since E(0VM / ° 2 ) < A k , the representation for the tail of the geometric series yields, 
as above, 

£E(0V< 2 ) < ^A^aArVKM^) < aVAf^L. 

Combining case (i) and (ii) yields the proof of the Theorem. □ 

9. Proof of Theorem 6.2. Recall the definition A fc := 2~ fe+2 r, k e N. Since 
||7r s (Id - 7r s )||| 2 = || (Id - 7r s )7r s ||| 2 and 7r s - 7r s = (Id - 7r s )7f s - 7r s (Id - n s ) is an 
orthogonal decomposition, observe first that 



|7T S C Q!S ||! 2 - ||7T S C Q:S ||| 2 = a 2 (j|7T s ||! 2 - ||7r s 7T s ||| 2 j 

= « 2 ||(Id-^)7r s || 2 2 

2 

= a 2 ||7r 5 (Id-7r s )||! 2 = — ||tt s - n s \\ 2 S2 , 
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that is, 

E||7r,(C a , 8 +E)||! a -E||7r 8 (C a , a +E)||! a G (a 2 A fc+1 ,a 2 A fe ] &■ ||5r s -7r 8 ||| 2 G (2A fc+1 ,2A fe ] 

Recall the definition 

GmA^C) = {7r 8 e5M,«:||7r,C||| 2 -||7r 8 C||| 2 <*}. 

Note at this point that with C a .M-s '■= ai7(Id — Id a )V', we have 7? r G 
GM,r(S,C air ) -^ (Id - 7r r ) G Gm,m-t{8, C a , M -r)- Define 



ja 2 A fc >der 2 s(M-s)j 



fc := argmax 

fee 

for some d > to be specified later, and let 

V s (a) := 0M, s (a 2 A fe ». + i,C aiS ) as well as © s (a) := £/M,«(a 2 A fc H.. +1 ,C' a)S ). 
Note that fc** — > oo as a — > oo. It holds that 

e( sup ||^ s (C a , s +E)|| 2 - ||7r 8 (C a;8 +E)|| 2 ] 

\ff s es„,» ° 2 02 J 

>e( sup |7r s E|| 2 2 - || tt s E || ^ + 2a tr(E'(7r 8 - 7r s )7r 8 ) - cLs(M - s)cr 2 

\Jf B e'D a (a) 

>E| sup 2a tr(E' (?r s -7r s )7r s ) -ds(M-s)a 2 ) 
ys ; s eX' s (Q) ^ ' y 

- E sup KsE - 7r s E 

V5F.el?.(a) " S2 5 7 

/ i, „2 ,,_ l|2 \ 

Because of limsupE sup 7r s E — 7r s E \ — 0, it remains to prove 
that 

liminf max E sup 2a tr( E' (tt s — ir s )ir s J — ds(M — s)<r 

a^oo se{r,M-r} ^ s6 p s ( a ) \ J J 

(9.1) = a 2 liminf max E| sup 2(a/cr) tr((E/cr)'7r s 7r s ) - ds(M - s) ) 

a^oo se{r,M-r} \n s£ T> s (a) ' / 

> a 2 r(M - r). 
First, we have 

E sup tr((E/<7)'7r s 7r g ) 

K 3 eT> s (a) 

= E sup tr((E/<7)'(Id - 7r g )7r s ) 

5r a el> s (a) 

= E sup (tr((E/a) / ^f 8 )-tr((E/c7) / 5r s (Id-7r 8 )) 



34 A. ROHDE 

which implies 

(9.2) E sup tr((E/o-)'(Id-7r s )7r s ) + E sup ti((E/a)'x s (Id- % s )) 

s s ex> s (a) ^^£153(0) 

> E sup tr((E/c7)'7f s ). 

In case M e 2N and s = M/2, since ||tt s - 7r s ||| 2 = ||(Id - n s ) - (Id - 7r s )||| 2 , 
both expectations on the LHS in the inequality (9.2) are identical for reasons of 
symmetry, which leads to 

E sup tr((E/cr)'7r s 7r s ) > -E sup tr((E/cr)'7r s ) in case s = M/2. 

TT 3 <£V a {a) 2 K 3 £V 3 (a) 

Although the polar decomposition of (Id — 7r s )7r s and 7^ (Id — ir s ), respectively, 
suggests a similar symmetry argument, we do not have a rigorous treatment of an 
argument of this type yet, and remain therefore with the inequality 

max E sup tr((E/o-)'(Id - 7T s )7r s ) > -E sup tt((E/a)%) 

se{r,M-r} n s eV s (a) 2 n r £V r (a) 

only. Note at this point that 

E sup tr((E/cr)'7r. r ) = E sup tr((E/a-)'7rM-r) 

K r eV r (a) 5r M _ r eZ> M -r(a) 

= E sup tr((E/a)'i : Af-r), 

where the second equality follows from invariance of the above expression under or- 
thogonal transformation. By Sudakov's minoration and the bound (8.13) of Lemma 
8.2, 



E ( sup tr((E/a)'i s ) ] > dJlogN(V r (a),ds 2 ,S) > 5y/r(M -r)J]og (^*g- 
ys s e-D s (a) / » ^ 

for any arbitrary < S < cA k i,/\/2, s e {r, M — r}, where we used that 
E( sup tr((E/o-)'(Id - 5r r )) ) =E( sup tr((E/o-)'7r r ) J 
and 

/ r Al/2\r(M-r) 

N(V r (a),d S2 ,6) > [-^r 

— 1/2 

with the constant c of Lemma 8.2. The choice 8 — cA fe ;*/8 yields finally 



(9.3) 



E( sup tr((E/o-) , 7r r ) ) > KA l k ( 2 m ^r(M - r) 
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for some constant K > which does not depend on a, M and a. Thus, 

(9.1) = cr 2 liminf max E| sup 2(a/a) trffE/crV^TTs) - dr(M - r) ) 

a^oo se{r,M-r} \5F 3 eX> 3 (a) / 



(9.4) > a 2 liminf E sup {a/ a) tr((E/cr)'5r r ) - dr(M - r) 



Choosing now d in the definition of A;** largest possible such that Ky/d—d > Ky/d/2 
and plugging the lower bound (9.3) into (9.4) proves the Theorem. □ 
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